Make text-to-fact criteria directive: exhaustive extraction + typed-literal compound terms by justinjoy · Pull Request #150 · semantic-reasoning/factlog

justinjoy · 2026-06-26T11:28:48Z

Two prompt-hardening changes to skills/factlog/references/text-to-fact.md,
the authoritative extraction criteria. Both convert a soft "may" into a "must,
when X" — a discretionary instruction the extractor reliably skips.

1. Exhaustive extraction (완전성 원칙)

Dense tables — participant rosters, financial/registry status, budget line
items, schedules, career/patent records — are the highest-density fact source,
yet the prior criteria only said "record relation candidates." In practice the
extractor skimmed prose and dropped repeated table rows: a real proposal with
~400 extractable facts yielded ~90 (≈20–25% coverage).

forbid sampling of repeated items ("대표 몇 개만" → extract all N)
table → triple mapping rule (row key→subject, header→relation, cell→object)
judge coverage by section/table sweep, not converted-file byte size
pre-finish self-check, PII exclusions preserved

2. Typed-literal compound terms (재량 아님)

Date/amount/ordinal/number objects left as prose strings ("2017.03.08",
"126백만원") can't be sorted/thresholded by the engine. Left to discretion the
extractor never emits compound terms (observed: 0 across a full sync).

require date()/ordinal()/amount()/number() for typed literals, with a
prose→term mapping table
honest engine-support note: date/ordinal fully project; amount is
positive-int + needs a unit table (use number() for negatives like a
loss); number() projection still pending (feat(typed): number-type comparison (engine has no float text column) #125) but emit for structure
cross-reference attribute-relations.md / typed-relations.md so declared
relations actually project and compare

Docs/criteria only — no code paths touched; the file is read at extraction
time so changes are live without reinstall.

Add a "완전성 원칙" section so extraction sweeps every section and table row-by-row instead of skimming prose. Dense tables (참여인력 명부, 재무·등기 현황, 예산 비목, 추진 일정, 경력·특허 실적) were the main silent-omission source — narrative got captured while repeated table rows were dropped. - forbid sampling ("대표 몇 개만") of repeated items - table → triple mapping rule (row key→subject, header→relation, cell→object) - judge coverage by section/table sweep, not converted-file byte size - pre-finish self-check, with existing PII exclusions preserved

justinjoy merged commit fe814a8 into main Jun 26, 2026
3 checks passed

justinjoy deleted the extraction-exhaustiveness branch June 26, 2026 11:30

justinjoy changed the title ~~Mandate exhaustive fact extraction in text-to-fact criteria~~ Make text-to-fact criteria directive: exhaustive extraction + typed-literal compound terms Jun 27, 2026

justinjoy mentioned this pull request Jun 27, 2026

Require compound terms for typed literal objects #151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make text-to-fact criteria directive: exhaustive extraction + typed-literal compound terms#150

Make text-to-fact criteria directive: exhaustive extraction + typed-literal compound terms#150
justinjoy merged 1 commit into
mainfrom
extraction-exhaustiveness

justinjoy commented Jun 26, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

justinjoy commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Exhaustive extraction (완전성 원칙)

2. Typed-literal compound terms (재량 아님)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

justinjoy commented Jun 26, 2026 •

edited

Loading